Overview
The Angiotensin Converting Enzyme Inhibition, ACE(I), on Diabetic Nephropathy Trial was a prospective, double blinded, randomized, controlled clinical trial comparing the effects of captopril, an ACE inhibitor, against placebo in slowing the progression of renal disease in people with insulin dependent diabetes mellitus (IDDM).
Four hundred nine (409) individuals participated in the original study at 30 centers between December 1987 and October 1990 and the analysis data set for this assignment includes data from 350 participants. The primary end point of the trial was doubling of the baseline serum creatinine concentration. The main results of the ACE(I) clinical trial were published in the article ‘The Effect of Angiotensin-Converting-Enzyme Inhibition on Diabetic Nephropathy’ (NEJM. 329: 1456-1462. November 1993). Edmund Lewis, M.D. and Raymond P. Bain, Ph.D., PI and co-PI respectively, of the ACE(I) clinical trial released the original SAS data sets of the trial for purposes of data management and data analysis for Advanced Epidemiologic Data Analysis (PUBH_6260) course at George Washington University.
The original SAS data set has been modified for purpose of this assignment.
Serum creatinine is a measure of renal function with higher values indicating poorer kidney function. The use of captopril was intended to reduce the likelihood of doubling of serum creatinine during the study, indicating less progression of renal disease.
Methods
Data Importing and Inspection
The dataset used for this analysis was derived from the Angiotensin Converting Enzyme Inhibition (ACE[I]) trial, a double-blind, randomized controlled trial assessing the effect of captopril on renal disease progression in insulin-dependent diabetes mellitus (IDDM) patients The analysis dataset (\(ACE.csv\)) containing 350 observations and multiple demographic and clinical variables was imported into the statistical software R version 4.5.1 using standard library import procedures. Data inspection was conducted to confirm successful importation, check for missing values, ensure correct variable types (numeric or character), and verify coding consistency. Formats(eg.’TXGRP’ for treatment group) were applied to variables enhance interpretability.
Summary statistics and Research Questions
The primary research question examined whether treatment with captopril reduced the risk of doubling of serum creatinine compared to placebo. Secondary analyses explored (1) whether baseline mean arterial pressure differed between smokers and non-smokers, and (2) whether a correlation existed between baseline serum creatinine and age. Summary statistics were calculated for all variables using descriptive procedures, including means and standard deviations for continuous variables (e.g., BASEMAP, BASESCR, AGE) and frequency distributions for categorical variables (e.g., SEX, TXGRP, SMOKER). These summaries provided an overview of baseline characteristics and facilitated assessment of data balance between treatment groups.
Data Visualization and Inferential Analyses
Data visualization techniques were employed to describe and illustrate variable distributions and relationships. Histograms and boxplots depicted the distribution of continuous variables, while bar charts summarized categorical variable frequencies. For the primary outcome, a logistic regression was applied to compare the proportion of patients with doubled serum creatinine between the captopril and placebo groups. The secondary question on mean arterial pressure differences between smokers and non-smokers was evaluated using an independent samples t-test(if assumption met). The correlation between baseline serum creatinine and age was assessed using Pearson’s correlation coefficient. Statistical significance was defined at α = 0.05, and results were reported with corresponding confidence intervals.
Import Libraries
Below we import the necessary libraries for data manipulation, visualization, and statistical analysis.
Set working directory
Importing the data
Data Sanity Check
We note that our dataset has \(350\) Observations and \(9\) Variables.
We also note that we don’t have any missing data.
Let us take a look at the structure of our dataset.
spc_tbl_ [350 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ BGIRA : num [1:350] 1 1 1 1 1 3 1 1 1 1 ...
$ BASEMAP: num [1:350] 114 115 110 111 104 ...
$ BPCAT : num [1:350] 3 3 3 3 3 3 3 3 3 3 ...
$ SEX : chr [1:350] "FEMALE" "MALE" "FEMALE" "MALE" ...
$ AGE : num [1:350] 38.7 43.7 33.7 37.4 42.1 ...
$ BASESCR: num [1:350] 1.35 1.55 2.25 1.73 1.38 ...
$ TXGRP : num [1:350] 2 1 2 2 2 2 2 1 1 2 ...
$ DOUBLE : chr [1:350] "YES" "YES" "YES" "YES" ...
$ SMOKER : chr [1:350] "YES" "YES" "YES" "NO" ...
- attr(*, "spec")=
.. cols(
.. BGIRA = col_double(),
.. BASEMAP = col_double(),
.. BPCAT = col_double(),
.. SEX = col_character(),
.. AGE = col_double(),
.. BASESCR = col_double(),
.. TXGRP = col_double(),
.. DOUBLE = col_character(),
.. SMOKER = col_character()
.. )
- attr(*, "problems")=<externalptr>
The properties of the \(9\) variables are tabled below
| Variable | Description | Properties |
|---|---|---|
| \(\texttt{AGE}\) | Age of participant | years |
| \(\texttt{BGIRA}\) | Race | “1” = White “2” = Black and “3” “4” or “5” = Other |
| \(\texttt{SMOKER}\) | Smoking status | “YES” or “NO” |
| \(\texttt{SEX}\) | SEX | “MALE” or “FEMALE” |
| \(\texttt{BPCAT}\) | Blood Pressure category at baseline visit | 1 = Normal 2 = Borderline 3 = Hypertensive |
| \(\texttt{TXGRP}\) | Treatment group | 1 = Captopril or 2 = Placebo” |
| \(\texttt{BASESCR}\) | Baseline serum creatinine | mg/dL |
| \(\texttt{BASEMAP}\) | Mean arterial pressure at the baseline visit | mm Hg |
| \(\texttt{DOUBLE}\) | Doubled Serum creatinine over course of the study | “YES” or “NO” |
Descriptive Statistics
Summary Statistics for Continuous Variables
The mean, median, and standard deviation for the measures of \(\texttt{AGE}\), mean arterial pressure at the baseline (\(\texttt{BASEMAP}\)), and serum creatinine at the baseline visit (\(\texttt{BASESCR}\)) overall and stratified by treatment group are summarized below.
# Summary statistics of age, BASEMAP and BASESCR by levels of treatment group
psych::describeBy(
dfr$AGE,
group = dfr$TXGRP
)
Descriptive statistics by group
group: Captopril
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 173 34.96 7.31 35.07 34.85 7.74 20.54 48.97 28.44 0.1 -0.88 0.56
------------------------------------------------------------
group: Placebo
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 177 33.95 7.57 33.72 33.88 8.06 18.28 49 30.72 0.08 -0.82 0.57
Descriptive statistics by group
group: Captopril
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 173 102.17 11.78 103 102.09 10.38 72 136.67 64.67 0.13 0.14
se
X1 0.9
------------------------------------------------------------
group: Placebo
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 177 103.72 12.56 103.33 103.37 12.85 69 140.67 71.67 0.27 0.03
se
X1 0.94
Descriptive statistics by group
group: Captopril
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 173 1.27 0.42 1.2 1.23 0.44 0.58 2.3 1.72 0.69 -0.49 0.03
------------------------------------------------------------
group: Placebo
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 177 1.26 0.41 1.18 1.24 0.41 0.55 2.5 1.95 0.58 -0.43 0.03
Summary Statistics for Categorical Variables
The frequency and proportion for the categorical variables \(\texttt{SEX}\), \(\texttt{SMOKER}\), \(\texttt{BPCAT}\), and \(\texttt{BGIRA}\) stratified by treatment group are summarized below.
TXGRP
BPCAT Captopril Placebo
Normal 21 15
Borderline 66 73
Hypertensive 86 89
Data Visualization
Data Visualization for our continous Variables
Below are the boxplots and histograms for the continuous variables \(\texttt{AGE}\), \(\texttt{BASEMAP}\), and \(\texttt{BASESCR}\) stratified by treatment group.
The \(\texttt{age}\) and \(\texttt{TXGRP}\) are considered in Figure 1.
AGE_TXGRP <- (dfr %>% ggplot2::ggplot(
aes(
x = TXGRP,
y = AGE
)
) +
ggplot2::geom_boxplot(
aes(
fill = TXGRP
),
show.legend = FALSE
) +
ggplot2::labs(
title = "Distribution of AGE by Treatment Group",
subtitle = "Comparison between Treatment groups"
) +
ggplot2::scale_fill_brewer(
palette = "Set1",
direction = -1
) +
ggplot2::xlab("Treatment Groups") +
ggplot2::ylab("Age") +
ggthemes::theme_clean());
plotly::ggplotly(AGE_TXGRP)The \(\texttt{BASEMAP}\) and \(\texttt{TXGRP}\) are considered in Figure 2.
BASEMAP_TXGRP <- (dfr %>% ggplot2::ggplot(
aes(
x = TXGRP,
y = BASEMAP
)
) +
ggplot2::geom_boxplot(
aes(
fill = TXGRP
),
show.legend = FALSE
) +
ggplot2::labs(
title = "Distribution of BASEMAP by Treatment Group",
subtitle = "Comparison between Treatment groups"
) +
ggplot2::scale_fill_brewer(
palette = "Set1",
direction = -1
) +
ggplot2::xlab("TXGRP") +
ggplot2::ylab("BASEMAP") +
ggthemes::theme_clean());
plotly::ggplotly(BASEMAP_TXGRP)The \(\texttt{BASESCR}\) and \(\texttt{TXGRP}\) are considered in Figure 3.
BASESCR_TXGRP <- (dfr %>% ggplot2::ggplot(
aes(
x = TXGRP,
y = BASESCR
)
) +
ggplot2::geom_boxplot(
aes(
fill = TXGRP
),
show.legend = FALSE
) +
ggplot2::labs(
title = "Distribution of BASESCR by Treatment Group",
subtitle = "Comparison between Treatment groups"
) +
ggplot2::scale_fill_brewer(
palette = "Set1",
direction = -1
) +
ggplot2::xlab("TXGRP") +
ggplot2::ylab("BASESCR") +
ggthemes::theme_clean());
plotly::ggplotly(BASESCR_TXGRP)A scatterplot that illustrates the relationship between \(\texttt{AGE}\) and \(\texttt{BASEMAP}\) is shown in Figure 4.
Data Visualization for our Categorical Variables
The mosaic plot on Figure 5 illustrates the relationship between \(\texttt{TXGRP}\) and \(\texttt{SEX}\) is shown.